-
Notifications
You must be signed in to change notification settings - Fork 43
feat(rust/sedona-spatial-join) Add partitioned index provider #555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(rust/sedona-spatial-join) Add partitioned index provider #555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a PartitionedIndexProvider to coordinate the creation and caching of spatial indexes for specified partitions. The provider is integrated into SpatialJoinExec and SpatialJoinStream, replacing the previous direct spatial index building approach. Memory reservations from the build side collection phase are now held by the provider rather than individual indexes. This is a preparatory step for supporting multi-partitioned spatial joins.
Changes:
- Introduced
PartitionedIndexProviderto manage index creation and caching across partitions - Refactored
SpatialJoinStreamto use the provider for index access - Moved memory reservation ownership from spatial indexes to the provider
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
rust/sedona-spatial-join/src/utils/disposable_async_cell.rs |
New utility for async cell that can be disposed to avoid unnecessary memory usage |
rust/sedona-spatial-join/src/utils/bbox_sampler.rs |
Removed #![allow(unused)] attribute |
rust/sedona-spatial-join/src/utils.rs |
Added disposable_async_cell module |
rust/sedona-spatial-join/src/stream.rs |
Updated to use PartitionedIndexProvider for index creation |
rust/sedona-spatial-join/src/prepare.rs |
New module for preparing spatial join components including the provider |
rust/sedona-spatial-join/src/lib.rs |
Replaced build_index module with prepare module |
rust/sedona-spatial-join/src/index/spatial_index_builder.rs |
Removed memory reservation tracking from builder |
rust/sedona-spatial-join/src/index/spatial_index.rs |
Removed memory reservation field from SpatialIndex |
rust/sedona-spatial-join/src/index/partitioned_index_provider.rs |
New provider for managing partitioned spatial indexes |
rust/sedona-spatial-join/src/index/memory_plan.rs |
New module for computing memory usage plans |
rust/sedona-spatial-join/src/index/build_side_collector.rs |
Added accessor method for spill metrics |
rust/sedona-spatial-join/src/index.rs |
Added new modules to index module |
rust/sedona-spatial-join/src/exec.rs |
Updated to create and use PartitionedIndexProvider |
rust/sedona-spatial-join/src/build_index.rs |
Removed memory pool parameter from index builder |
rust/sedona-spatial-join/Cargo.toml |
Added tokio dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 !
rust/sedona-spatial-join/src/index/partitioned_index_provider.rs
Outdated
Show resolved
Hide resolved
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This patch adds a index provider for coordinating the creation of spatial index for specified partitions. It is also integrated into
SpatialJoinExecso we use it to create the spatial index even when there's only one spatial partition (the degenerate case). The handling for multiple spatial partitions will be added in a subsequent PR.The memory reservations growed in the build side collection phase will be held by
PartitionedIndexProvider. Spatial indexes created by the provider does not need to hold memory reservations.The next step is to support partitioned probe side by adding a
PartitionedProbeStreamProvider, and modify the state machine ofSpatialJoinStreamto process multiple spatial partitions sequentially.